---
title: "Cache"
description: "Reduce latency and save costs by caching on the edge"
---

Caching, by temporarily storing data closer to the user at the edge, significantly speeds up access time and enhances application performance. This edge deployment ensures low latency, resulting in faster responses and an efficient app development process.

![](/images/caching.png)

### Quick start

To get started, just set `Helicone-Cache-Enabled` to true in the headers, or use the Python or NPM packages to turn it on via parameters.

<CodeGroup>

```bash Curl
curl https://oai.hconeai.com/v1/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Helicone-Cache-Enabled: true' \
  -d '{
    "model": "text-davinci-003",
    "prompt": "How do I enable caching?",
}'
```

```python Python w/Package
response = openai.Completion.create(
	model="text-davinci-003",
	prompt="How do I cache with helicone?",
	cache=True,
)
```

```python Python w/o Package
openai.api_base = "https://oai.hconeai.com/v1"

openai.Completion.create(
    model="text-davinci-003",
    prompt="How do I cache with helicone?",
    headers={
      "Helicone-Cache-Enabled": "true",
    }
)
```

```ts Node.js (w/ package)
import { HeliconeProxyConfiguration as Configuration } from "./core/HeliconeProxyConfiguration";
import { HeliconeProxyOpenAIApi as OpenAIApi } from "./proxy_logger/HeliconeProxyOpenAIApi";
const configuration = new Configuration({
  apiKey: process.env.OPENAI_API_KEY,
  heliconeMeta: {
    // ... other meta data
    cache: true,
  },
});

const openai = new OpenAIApi(configuration);
```

```ts Node.js (w/o package)
import { Configuration, OpenAIApi } from "openai";
const configuration = new Configuration({
  apiKey: process.env.OPENAI_API_KEY,
  basePath: "https://oai.hconeai.com/v1",
  defaultHeaders: {
    "Helicone-Cache-Enabled": "true",
  },
});
const openai = new OpenAIApi(configuration);
```

</CodeGroup>

The default caching limit is `7 days` , if you want a longer cache please refer to the [Cache Parameters](#cache-parameters) section and add the Cache-Control header to your request.

### Limitations

The max time limit for a cache limit is `365 days`

### Cache Parameters

`Helicone-Cache-Enabled (required)`: This will enable storing and loading from your cache

`Cache-Control (optional)`: Allow you to configure based on the [Cloudflare Cache Directive](https://developers.cloudflare.com/cache/about/cache-control#cache-control-directives), currently only max-age is supported, but we will be adding more configuration options soon.

Example of setting the cache to `2592000 seconds` aka `30 days`:

`"Cache-Control": "max-age=2592000"`

### Cache Buckets

You can increase the size of the cache bucket, so that after the n'th request we randomly choose from a previously cached element within the bucket.

Here is an example with a bucket size of 3

```
openai.completion("give me a random number") -> "42"
# Cache Miss
openai.completion("give me a random number") -> "47"
# Cache Miss
openai.completion("give me a random number") -> "17"
# Cache Miss

openai.completion("give me a random number") -> This will randomly choose 42 | 47 | 17
# Cache Hit
```

### Configuring Bucket size

Simply add `Helicone-Cache-Bucket-Max-Size` with some number to choose how large you want your Bucket size to be.

Note: The max number of caches you can store is `20` within a bucket, if you want more you will need to upgrade to an enterprise plan.

<CodeGroup>

```python Python
openai.api_base = "https://oai.hconeai.com/v1"

openai.Completion.create(
model="text-davinci-003",
prompt="Say this is a test",
headers={
"Helicone-Auth": f"Bearer {HELICONE_API_KEY}",
"Helicone-Cache-Enabled": "true",
"Helicone-Cache-Bucket-Max-Size": "5",
}
)
```

```typescript TypeScript
import { Configuration, OpenAIApi } from "openai";
const configuration = new Configuration({
  apiKey: process.env.OPENAI_API_KEY,
  basePath: "https://oai.hconeai.com/v1",
  defaultHeaders: {
    "Helicone-Cache-Enabled": "true",
    "Helicone-Cache-Bucket-Max-Size": "5",
  },
});
const openai = new OpenAIApi(configuration);
```

</CodeGroup>

### Adding Cache Seed

You can add a seed to your cache, so that you can have a deterministic cache, this is useful for when you want to have a consistent cache across multiple requests.

```python
  defaultHeaders: {
    "Helicone-Cache-Seed": "user-123"
  },
```

Add a header called `Helicone-Cache-Seed` with a string value for the seed.

```
# Bucket size 1

# Cache Seed "user-123"
openai.completion("give me a random number") -> "42"
openai.completion("give me a random number") -> "42"

# Cache Seed "user-456"
openai.completion("give me a random number") -> "17"

# Cache Seed "user-123"
openai.completion("give me a random number") -> "42"

# Cache Seed "user-456"
openai.completion("give me a random number") -> "17"
```

### Cache Response Headers

When cache is enabled the following headers will be returned.
```ts
helicone-cache:	"HIT" | "MISS"
helicone-cache-bucket-idx: number
```

Ex extracting headers from python with OpenAI:

```python
client = OpenAI(
    api_key="<OPENAI_API_KEY>",
    base_url="http://oai.hconeai.com/v1",
    default_headers={ 
        "Helicone-Auth": f"Bearer <API_KEY>",
    }
)

# add `.with_raw_response` here
chat_completion_raw = client.chat.completions.with_raw_response.create(
    model="gpt-4-vision-preview",
    messages=[
        {"role": "user", "content": "Hello world!"}
    ],
    extra_headers={
        "Helicone-Cache-Enabled": "true" # Cache enabled
    },
)

# This is the original parsed response as expected...
chat_completion = chat_completion_raw.parse()


cache_hit = chat_completion_raw.http_response.headers.get(
    'Helicone-Cache')

print(cache_hit) # Will print "HIT" or "MISS"

```

